首页> 外文OA文献 >Language Modeling by Clustering with Word Embeddings for Text Readability Assessment
【2h】

Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

机译:基于文本嵌入的聚类语言建模   可读性评估

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We present a clustering-based language model using word embeddings for textreadability prediction. Presumably, an Euclidean semantic space hypothesisholds true for word embeddings whose training is done by observing wordco-occurrences. We argue that clustering with word embeddings in the metricspace should yield feature representations in a higher semantic spaceappropriate for text regression. Also, by representing features in terms ofhistograms, our approach can naturally address documents of varying lengths. Anempirical evaluation using the Common Core Standards corpus reveals that thefeatures formed on our clustering-based language model significantly improvethe previously known results for the same corpus in readability prediction. Wealso evaluate the task of sentence matching based on semantic relatedness usingthe Wiki-SimpleWiki corpus and find that our features lead to superior matchingperformance.
机译:我们提出了一种基于聚类的语言模型,该模型使用单词嵌入进行文本可读性预测。据推测,欧几里得语义空间假设对于单词嵌入而言成立,其词法是通过观察单词共现来进行训练的。我们认为,在度量空间中用词嵌入进行聚类应该在适合于文本回归的更高语义空间中产生特征表示。而且,通过使用直方图表示特征,我们的方法可以自然地处理长度不同的文档。使用Common Core Standards语料库的经验评估表明,在我们基于聚类的语言模型上形成的功能显着改善了相同语料库在可读性预测中的先前已知结果。我们还使用Wiki-SimpleWiki语料库基于语义相关性评估了句子匹配的任务,并发现我们的功能可带来出色的匹配性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号